feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids#6352
Open
g-talbot wants to merge 1 commit intogtt/merge-output-split-metadatafrom
Open
feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids#6352g-talbot wants to merge 1 commit intogtt/merge-output-split-metadatafrom
g-talbot wants to merge 1 commit intogtt/merge-output-split-metadatafrom
Conversation
3227b37 to
ceba410
Compare
4 tasks
ceba410 to
e96a920
Compare
3 tasks
4 tasks
e96a920 to
9926093
Compare
acc5099 to
49176b0
Compare
fc6f90a to
720560d
Compare
…it_ids (Phase 3a) Phase 3 pipeline integration, first PR: - merge_parquet_split_metadata(): aggregates input split metadata with MergeOutputFile physical metadata to produce complete ParquetSplitMetadata for merged output. Validates invariant fields, unions metric_names and tags, finalizes tag cardinality after merge. 17 tests. - ParquetNewSplits, ParquetMergeTask, ParquetMergeScratch message types for the merge actor chain (planner → scheduler → downloader → executor). - Add replaced_split_ids to ParquetSplitBatch and propagate through ParquetUploader (was hardcoded Vec::new()). Enables merge executor to specify which splits are being replaced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
49176b0 to
17135dc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 (pipeline integration), first PR. Building on Phase 1 (merge engine, #6335) and Phase 2 (merge policy, #6351).
merge_parquet_split_metadata()— aggregates input split metadata withMergeOutputFilephysical metadata to produce completeParquetSplitMetadatafor merged output. Validates invariant fields (kind, index_uid, partition_id, sort_fields, window), unions metric_names and tags, finalizes tag cardinality after merge. 17 unit tests.ParquetNewSplits,ParquetMergeTask,ParquetMergeScratchfor the merge actor chain (planner → scheduler → downloader → executor).replaced_split_ids— added toParquetSplitBatchand propagated throughParquetUploader(was hardcodedVec::new()). Enables the merge executor to specify which splits are being replaced during atomic publish-and-replace.Test plan
merge_parquet_split_metadata()ParquetUploadertests pass with new fieldcargo clippyclean,cargo doccompiles, license headers OK🤖 Generated with Claude Code